端點偵測技術在強健語音參數擷取之研究 (Study of the Voice Activity Detection Techniques for Robust Speech Feature Extraction) [In Chinese]
نویسندگان
چکیده
The performance of a speech recognition system is often degraded due to the mismatch between the environments of development and application. One of the major sources that give rises to this mismatch is additive noise. The approaches for handling the problem of additive noise can be divided into three classes: speech enhancement, robust speech feature extraction, and compensation of speech models. In this thesis, we are focused on the second class, robust speech feature extraction. The approaches of speech robust feature extraction are often together with the voice activity detection in order to estimate the noise characteristics. A voice activity detector (VAD) is used to discriminate the speech and noise-only portions within an utterance. This thesis primarily investigates the effectiveness of various features for the VAD. These features include low-frequency spectral magnitude (LFSM), full-band spectral magnitude (FBSM), cumulative quantized spectrum (CQS) and high-pass log-energy (HPLE). The resulting VAD offers the noise information to two noise-robustness techniques, spectral subtraction (SS) and silence log-energy normalization (SLEN), in order to reduce the influence of additive noise in speech recognition. The recognition experiments are conducted on Aurora-2 database. Experimental results show that the proposed VAD is capable of providing accurate noise information, with which the following processes, SS and SLEN, significantly improve the speech recognition performance in various noise-corrupted environments. As a result, we confirm that an appropriate selection of features for VAD implicitly improves the noise robustness of a speech recognition system.
منابع مشابه
強健性語音辨識中分頻段調變頻譜補償之研究 (A Study of Sub-band Modulation Spectrum Compensation for Robust Speech Recognition) [In Chinese]
雖然語音科技進步迅速,但自動語音辨識仍是一門值得繼續研究開發的課題。因為 目前多數的語音辨識系統應用於不受干擾的安靜環境,雖然能得到相當滿意的辨識效 果,但若將其應用於實際的環境中,語音訊號往往會因為環境雜訊的影響,導致辨識效 能有明顯地衰減,發展多年的強健性技術即是針對此項缺點作改進。 在諸多強健性技術中,有一類方法為對語音特徵作統計上的正規化,傳統上, 這些方法都是對全頻段的語音特徵時間序列做正規化處理,然而,在分析此類方法的效 能上,通常是以其調變頻譜的正規化程度作為效能的依據,因此,如果直接在語音特徵 之調變頻譜上作正規化,應亦可達到不錯的效果。另外,由於不同頻率的調變頻率成 份具有不相等的重要性,但是傳統之特徵時間序列正規化法相對忽略了此性質,基於這 些觀察,在本論文中,我們提出了一系列的分頻段調變頻譜統計正規化法,此類方法可 以分別正規化不同頻段的統計特性,進而提升語音特...
متن کامل最小變異數調變頻譜濾波器於強健性語音辨識之研究 (A Study of Minimum Variance Modulation Filter for Robust Speech Recognition) [In Chinese]
本論文所探討的是語音特徵強健性技術,藉此改善雜訊環境下語音辨識的效能。我們利 用原始最小變異數調變濾波器法設計的環境失真目標函數,應用至求取濾波器之最佳頻 率響應上,進而發展出兩種特徵時間序列濾波器求取演算法,分別為基於最小變異數準 則之最小平方頻譜擬合法 (MV-LSSF)及基於最小變異數準則之強度頻譜內插法 (MV-MSI)。在這兩種方法中,利用我們所求得的濾波器之最佳頻率響應取代原始最小 平方頻譜擬合法(LSSF)與強度頻譜內插法(MSI)中所使用的濾波器,來得到欲逼近的目 標功率頻譜密度。從 Aurora-2 連續數字資料庫的實驗結果證實,這兩種基於最小變異 數準之調變頻譜正規化法,在各種雜訊環境下都優於傳統的兩種調變頻譜正規化法,而 得到更佳的辨識精確度。與基礎實驗結果相比較,MV-LSSF 與MV-MSI 所達到之相對 錯誤降低率分別為在 55.41%與 51.20%,顯...
متن کامل雜訊環境下應用線性估測編碼於特徵時序列之強健性語音辨識 (Employing linear prediction coding in feature time sequences for robust speech recognition in noisy environments) [In Chinese]
近幾十年來,無數的學者先進對於此雜訊干擾問題提出了豐富眾多的演算法,略分成兩 大類別:強健性語音特徵參數表示法(robust speech feature representation)與語音模型調適 法(speech model adaptation),第一類別之方法主要目的在抽取不易受到外在環境干擾下 而失真的語音特徵參數,或從原始語音特徵中儘量削減雜訊造成的效應,比較知名的方 法有:倒頻譜平均值與變異數正規化法 (cepstral mean and variance normalization, CMVN)[1]、倒頻譜統計圖正規化法(cepstral histogram normalization, CHN)[2]、倒頻譜平 均值與變異數正規化結合自動回歸動態平均濾波器法(cepstral mean and variance normalization plus auto-r...
متن کامل運用類神經網路方法之語言端點偵測研究 (A Study on Voice Activation Detection by Using Neural Networks) [In Chinese]
This study used DNN (Deep Neural Network) to process Voice Activation Detection, and discussed the following variable which affect the performance of VAD: (1) The analyzed window size of MFCC feature extraction, (2) Layer number of DNN, (3) Signal to Noise Ratio, and (4) The type of background condition. This experiment used NTPU Noise Corpus, which is mixed by many kinds of background noise re...
متن کامل併合式倒頻譜統計正規化技術於強健性語音辨識之研究 (A Study of Hybrid-based Cepstral Statistics Normalization Techniques for Robust Speech Recognition) [In Chinese]
Cepstral statistics normalization techniques have been shown to be very successful at improving the noise robustness of speech features. In this paper, we propose a hybrid-based scheme to achieve a more accurate estimate of the statistical information of features in these techniques. By properly integrating codebook and utterance/segment knowledge, the
متن کامل